[GODE]GODE Other Methods
from pyod.models.lof import LOF
LOF[@breunig2000lof]
LOF(=20,
n_neighbors='auto',
algorithm=30,
leaf_size='minkowski',
metric=2,
p=None,
metric_params=0.1,
contamination=1,
n_jobs=True,
novelty )
Parameter | Description | Default Value |
---|---|---|
n_neighbors | Number of neighbors to use by default for kneighbors queries. If n_neighbors is larger than the number of samples provided, all samples will be used. |
20 |
algorithm | Algorithm used to compute the nearest neighbors. Options: ‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’. ‘auto’ attempts to decide the most appropriate algorithm based on the values passed to the fit method. |
‘auto’ |
leaf_size | Leaf size passed to BallTree or KDTree . Affects the speed of construction and query, as well as the memory required to store the tree. Optimal value depends on the nature of the problem. |
30 |
metric | Metric used for distance computation. Options include various distances from scikit-learn and scipy.spatial.distance. If ‘precomputed’, X is expected to be a distance matrix. | ‘minkowski’ |
p | Parameter for the Minkowski metric. For p = 1, equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. | 2 |
metric_params | Additional keyword arguments for the metric function. | None |
contamination | The amount of contamination of the data set, i.e., the proportion of outliers. Used to define the threshold on the decision function. | 0.1 |
n_jobs | The number of parallel jobs to run for neighbors search. If -1, the number of jobs is set to the number of CPU cores. Affects only kneighbors and kneighbors_graph methods. | 1 |
novelty | Set to True if using LOF for novelty detection. Use predict, decision_function, and score_samples only on new unseen data, not on the training set. | False |
from pyod.models.knn import KNN
kNN[@ramaswamy2000efficient]
KNN(=0.1,
contamination=5,
n_neighbors='largest',
method=1.0,
radius='auto',
algorithm=30,
leaf_size='minkowski',
metric=2,
p=None,
metric_params=1,
n_jobs**kwargs,
)
Parameter | Description | Default |
---|---|---|
contamination | Proportion of outliers in the data set, used to define the threshold on the decision function. | 0.1 |
n_neighbors | Number of neighbors to use for k neighbors queries. | 5 |
method | Method for kNN detection: ‘largest’, ‘mean’, or ‘median’. | ‘largest’ |
radius | Range of parameter space for radius_neighbors queries. | 1.0 |
algorithm | Algorithm to compute nearest neighbors: ‘auto’, ‘ball_tree’, ‘kd_tree’, or ‘brute’. | ‘auto’ |
leaf_size | Leaf size passed to BallTree, affecting construction/query speed and memory. | 30 |
metric | Metric for distance computation, from scikit-learn or scipy.spatial.distance. | ‘minkowski’ |
p | Parameter for Minkowski metric, equivalent to manhattan_distance (l1) for p = 1 and euclidean_distance (l2) for p = 2. | 2 |
metric_params | Additional keyword arguments for the metric function. | None |
n_jobs | Number of parallel jobs for neighbors search. -1 uses CPU cores. | 1 |
from pyod.models.cblof import CBLOF
CBLOF[@he2003discovering]
CBLOF(=8,
n_clusters=0.1,
contamination=None,
clustering_estimator=0.9,
alpha=5,
beta=False,
use_weights=False,
check_estimator=None,
random_state=1,
n_jobs )
Parameter | Description | Default |
---|---|---|
n_clusters | Number of clusters to form and centroids to generate. | 8 |
contamination | Amount of contamination in the data set, proportion of outliers. Used to define threshold. | 0.1 |
clustering_estimator | Base clustering algorithm for data clustering. Requires fit() and predict(). Default is KMeans. | None |
alpha | Coefficient for deciding small and large clusters. | 0.9 |
beta | Coefficient for deciding small and large clusters. | 5 |
use_weights | Use cluster sizes as weights in outlier score calculation. | False |
check_estimator | Check if base estimator is consistent with sklearn standard. | False |
random_state | Seed for random number generator. | None |
from pyod.models.ocsvm import OCSVM
OCSVM@manevitz2001one]
OCSVM(='rbf',
kernel=3,
degree='auto',
gamma=0.0,
coef0=0.001,
tol=0.5,
nu=True,
shrinking=200,
cache_size=False,
verbose=-1,
max_iter=0.1,
contamination )
Parameter | Description | Default Value |
---|---|---|
kernel | Specifies the kernel type to be used in the algorithm. Options: ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’, or a callable. If none is given, ‘rbf’ will be used. | ‘rbf’ |
nu | An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default, 0.5 will be taken. | 0.5 |
degree | Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels. | 3 |
gamma | Kernel coefficient for ‘rbf’, ‘poly’, and ‘sigmoid’. If gamma is ‘auto’, then 1/n_features will be used instead. | ‘auto’ |
coef0 | Independent term in the kernel function. Only significant in ‘poly’ and ‘sigmoid’. | 0.0 |
tol | Tolerance for stopping criterion. | None |
shrinking | Whether to use the shrinking heuristic. | True |
cache_size | Specify the size of the kernel cache (in MB). | 200 |
verbose | Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm. | False |
max_iter | Hard limit on iterations within the solver, or -1 for no limit. | -1 |
contamination | The amount of contamination of the data set, i.e., the proportion of outliers. Used when fitting to define the threshold on the decision function. | 0.1 |
from pyod.models.mcd import MCD
MCD[@hardin2004outlier]
MCD(=0.1,
contamination=True,
store_precision=False,
assume_centered=None,
support_fraction=None,
random_state )
Parameter | Description | Default |
---|---|---|
contamination | float in (0., 0.5), optional (default=0.1) The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. | 0.1 |
store_precision | bool Specify if the estimated precision is stored. | True |
assume_centered | bool If True, the support of the robust location and the covariance estimates is computed, and a covariance estimate is recomputed from it, without centering the data. Useful to work with data whose mean is significantly equal to zero but is not exactly zero. If False, the robust location and covariance are directly computed with the FastMCD algorithm without additional treatment. | False |
support_fraction | float, 0 < support_fraction < 1 The proportion of points to be included in the support of the raw MCD estimate. Default is None, which implies that the minimum value of support_fraction will be used within the algorithm: [n_sample + n_features + 1] / 2 | None |
random_state | int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. | None |
from pyod.models.feature_bagging import FeatureBagging
FeatureBagging[@lazarevic2005feature]
FeatureBagging(=None,
base_estimator=10,
n_estimators=0.1,
contamination=1.0,
max_features=False,
bootstrap_features=True,
check_detector=False,
check_estimator=1,
n_jobs=None,
random_state='average',
combination=0,
verbose=None,
estimator_params )
Parameter | Description | Default |
---|---|---|
base_estimator | The base estimator to fit on random subsets of the dataset. If None, base estimator is LOF detector. | None |
n_estimators | The number of base estimators in the ensemble. | 10 |
contamination | Amount of contamination in the data set, proportion of outliers. Used to define threshold. | 0.1 |
max_features | Number of features to draw from X to train each base estimator. | 1.0 |
bootstrap_features | Whether features are drawn with replacement. | False |
check_detector | If True, check if base estimator is consistent with pyod standard. | True |
check_estimator | If True, check if base estimator is consistent with sklearn standard. Deprecated in pyod 0.6.9. Replaced by check_detector. | False |
n_jobs | Number of jobs to run in parallel for both fit and predict. | 1 |
random_state | Seed used by random number generator. | None |
combination | Method of combination: ‘average’ for average scores, ‘max’ for maximum scores. | ‘average’ |
verbose | Controls the verbosity of the building process. | 0 |
estimator_params | List of attributes to use as parameters when instantiating a new base estimator. | None |
from pyod.models.abod import ABOD
ABOD[@kriegel2008angle]
=0.1, n_neighbors=5, method='fast') ABOD(contamination
Parameter | Description | Default |
---|---|---|
contamination | float in (0., 0.5), optional (default=0.1) The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. | 0.1 |
n_neighbors | int, optional (default=10) Number of neighbors to use by default for k neighbors queries. | 10 |
method | str, optional (default=‘fast’) Method for ABOD: ‘fast’ for fast ABOD using n_neighbors only, ‘default’ for original ABOD using all training points (could be slower). | ‘fast’ |
from pyod.models.iforest import IForest
IForest[@liu2008isolation]
IForest(=100,
n_estimators='auto',
max_samples=0.1,
contamination=1.0,
max_features=False,
bootstrap=1,
n_jobs='old',
behaviour=None,
random_state=0,
verbose )
Parameter | Description | Default Value |
---|---|---|
n_estimators | The number of base estimators in the ensemble. | 100 |
max_samples | The number of samples to draw from X to train each base estimator. If int, then draw max_samples samples. If float, draw max_samples * X.shape[0] samples. If “auto”, then max_samples=min(256, n_samples) . |
“auto” |
contamination | The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. | 0.1 |
max_features | The number of features to draw from X to train each base estimator. If int, then draw max_features features. If float, draw max_features * X.shape[1] features. |
1.0 |
bootstrap | If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed. | False |
n_jobs | The number of jobs to run in parallel for both fit and predict . If -1, then the number of jobs is set to the number of cores. |
1 |
behaviour | Behaviour of the decision_function . Options: ‘old’ or ‘new’. ‘old’ is deprecated in sklearn 0.20 and will not be possible in 0.22. ‘new’ becomes dependent on the contamination parameter, with 0 being the natural threshold. |
‘old’ |
random_state | Seed used by the random number generator. If int, random_state is the seed. If RandomState instance, random_state is the random number generator. If None, the random number generator is the RandomState instance used by np.random . |
None |
verbose | Controls the verbosity of the tree building process. | 0 |
from pyod.models.hbos import HBOS
HBOS[@goldstein2012histogram]
=10, alpha=0.1, tol=0.5, contamination=0.1) HBOS(n_bins
Parameter | Description | Default |
---|---|---|
n_bins | int or string, optional (default=10) The number of bins. “auto” uses the birge-rozenblac method for automatic selection of the optimal number of bins for each feature. | 10 |
alpha | float in (0, 1), optional (default=0.1) The regularizer for preventing overflow. | 0.1 |
tol | float in (0, 1), optional (default=0.5) The parameter to decide the flexibility while dealing the samples falling outside the bins. | 0.5 |
contamination | float in (0., 0.5), optional (default=0.1) The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. | 0.1 |
from pyod.models.sos import SOS
SOS[@janssens2012stochastic]
=0.1, perplexity=4.5, metric='euclidean', eps=1e-05) SOS(contamination
Parameter | Description | Default |
---|---|---|
contamination | float in (0., 0.5), optional (default=0.1) The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. | 0.1 |
perplexity | float, optional (default=4.5) A smooth measure of the effective number of neighbors. Perplexity is similar to parameter k in kNN algorithm (number of nearest neighbors). Perplexity range: 1 to n-1, where n is number of samples. |
4.5 |
metric | str, default ‘euclidean’ Metric used for distance computation. Can use any metric from scipy.spatial.distance. Valid values: ‘euclidean’, [‘braycurtis’, ‘canberra’, ‘chebyshev’, …]. See scipy.spatial.distance documentation for details. | ‘euclidean’ |
eps | float, optional (default=1e-5) Tolerance threshold for floating point errors. | 1e-5 |
from pyod.models.so_gaal import SO_GAAL
SO_GAAL[@liu2019generative]
SO_GAAL(=20,
stop_epochs=0.01,
lr_d=0.0001,
lr_g=0.9,
momentum=0.1,
contamination )
Parameter | Description | Default |
---|---|---|
contamination | float in (0., 0.5), optional (default=0.1) The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. | 0.1 |
stop_epochs | int, optional (default=20) The number of epochs of training. Total epochs equals three times stop_epochs. | 20 |
lr_d | float, optional (default=0.01) The learn rate of the discriminator. | 0.01 |
lr_g | float, optional (default=0.0001) The learn rate of the generator. | 0.0001 |
momentum | float, optional (default=0.9) The momentum parameter for SGD. | 0.9 |
from pyod.models.mo_gaal import MO_GAAL
MO_GAAL[@liu2019generative]
MO_GAAL(=10,
k=20,
stop_epochs=0.01,
lr_d=0.0001,
lr_g=0.9,
momentum=0.1,
contamination )
Parameter | Description | Default |
---|---|---|
contamination | float in (0., 0.5), optional (default=0.1) The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. | 0.1 |
k | int, optional (default=10) The number of sub generators. | 10 |
stop_epochs | int, optional (default=20) The number of epochs of training. Total epochs equals three times stop_epochs. | 20 |
lr_d | float, optional (default=0.01) The learn rate of the discriminator. | 0.01 |
lr_g | float, optional (default=0.0001) The learn rate of the generator. | 0.0001 |
momentum | float, optional (default=0.9) The momentum parameter for SGD. | 0.9 |
from pyod.models.lscp import LSCP
LSCP[@zhao2019lscp]
LSCP(
detector_list,=30,
local_region_size=1.0,
local_max_features=10,
n_bins=None,
random_state=0.1,
contamination )
Parameter | Description | Default |
---|---|---|
detector_list | List, length must be greater than 1 Base unsupervised outlier detectors from PyOD. Requires fit and decision_function methods. | - |
local_region_size | int, optional (default=30) Number of training points to consider in each iteration of local region generation process (30 by default). | 30 |
local_max_features | float in (0.5, 1.), optional (default=1.0) Maximum proportion of number of features to consider when defining local region (1.0 by default). | 1.0 |
n_bins | int, optional (default=10) Number of bins to use when selecting the local region. | 10 |
random_state | RandomState, optional (default=None) A random number generator instance to define the state of the random permutations generator. | None |
contamination | float in (0., 0.5), optional (default=0.1) The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function (0.1 by default). | 0.1 |